E cient Planning in MDPs by Small Backups
نویسندگان
چکیده
E cient planning plays a crucial role in model-based reinforcement learning. Traditionally, the main planning operation is a full backup based on the current estimates of the successor states. Consequently, its computation time is proportional to the number of successor states. In this paper, we introduce a new planning backup that uses only the current value of a single successor state and has a computation time independent of the number of successor states. This new backup, which we call a small backup, opens the door to a new class of model-based reinforcement learning methods that exhibit much finer control over their planning process than traditional methods. We empirically demonstrate that this increased flexibility allows for more e cient planning by showing that an implementation of prioritized sweeping based on small backups achieves a substantial performance improvement over classical implementations.
منابع مشابه
On MABs and Separation of Concerns in Monte-Carlo Planning for MDPs
Linking online planning for MDPs with their special case of stochastic multi-armed bandit problems, we analyze three state-of-the-art Monte-Carlo tree search algorithms: UCT, BRUE, and MaxUCT. Using the outcome, we (i) introduce two new MCTS algorithms, MaxBRUE, which combines uniform sampling with Bellman backups, and MpaUCT, which combines UCB1 with a novel backup procedure, (ii) analyze them...
متن کاملDiscovering hidden structure in factored MDPs
Markov Decision Processes (MDPs) describe a wide variety of planning scenarios ranging from military operations planning to controlling a Mars rover. However, today’s solution techniques scale poorly, limiting MDPs’ practical applicability. In this work, we propose algorithms that automatically discover and exploit the hidden structure of factored MDPs. Doing so helps solve MDPs faster and with...
متن کاملTopological Value Iteration Algorithm for Markov Decision Processes
Value Iteration is an inefficient algorithm for Markov decision processes (MDPs) because it puts the majority of its effort into backing up the entire state space, which turns out to be unnecessary in many cases. In order to overcome this problem, many approaches have been proposed. Among them, LAO*, LRTDP and HDP are state-of-theart ones. All of these use reachability analysis and heuristics t...
متن کاملPlanning in Stochastic Domains: Problem Characteristics and Approximations (version Ii)
This paper is about planning in stochastic domains by means of partially observable Markov decision processes (POMDPs). POMDPs are di cult to solve and approximation is a must in real-world applications. Approximation methods can be classi ed into those that solve a POMDP directly and those that approximate a POMDP model by a simpler model. Only one previous method falls into the second categor...
متن کاملVariable Independence in Markov Decision Problems
In decision-theoretic planning, the problem of planning under uncertainty is formulated as a multidimensional, or factoredMDP. Traditional dynamic programming techniques are ine cient for solving factored MDPs whose state and action spaces are exponential in the number of the state and action variables, correspondingly. We focus on exploiting problems' structure imposed by variable independence...
متن کامل